Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing

نویسندگان

  • Stephen Wu
  • Timothy Miller
  • James Masanz
  • Matt Coarr
  • Scott Halgrim
  • David Carrell
  • Cheryl Clark
چکیده

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizability in Clinical Natural Language Processing

Authors Stephen Wu Timothy Miller James Masanz Matt Coarr Scott Halgrim David Carrell Cheryl Clark Affiliation Department of Health Sciences Research Mayo Clinic 200 First Street SW Rochester, MN 55905, USA b Children's Hospital Boston Informatics Program Harvard Medical School 300 Longwood Ave Boston, MA 02115, USA c The MITRE Corporation 202 Burlington Road Bedford, MA 01730, USA d Group Heal...

متن کامل

Processing Natural Language Software Requirement Speciications

Ambiguity in requirement speciications causes numerous problems; for example in deening customer/supplier contracts, ensuring the integrity of safety-critical systems, and analysing the implications of system change requests. A direct appeal to formal speciication has not solved these problems, partly because of the restrictiveness and lack of habitability of formal languages. An alternative ap...

متن کامل

Unsupervised Domain Adaptation for Clinical Negation Detection

Detecting negated concepts in clinical texts is an important part of NLP information extraction systems. However, generalizability of negation systems is lacking, as cross-domain experiments suffer dramatic performance losses. We examine the performance of multiple unsupervised domain adaptation algorithms on clinical negation detection, finding only modest gains that fall well short of in-doma...

متن کامل

Score Generalizability of Writing Assessment: the Effect of Rater’s Gender

The score reliability of language performance tests has attracted increasing interest. Classical Test Theory cannot examine multiple sources of measurement error. Generalizability theory extends Classical Test Theory to provide a practical framework to identify and estimate multiple factors contributing to the total variance of measurement. Generalizability theory by using analysis of variance ...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014